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Abstract — Cooperative optimization is a new way for finding 
global optima of complicated functions of many variables. It has 
some important properties not possessed by any conventional 
optimization methods. It has been successfully applied in solving 
many large scale optimization problems in image processing, 
computer vision, and computational chemistry. This paper shows 
the application of this optimization principle in decoding LDPC 
codes, which is another hard combinatorial optimization prob- 
lem. In our experiments, it significantly out-performed the sum- 
product algorithm, the best known method for decoding LDPC 
codes. Compared to the sum-product algorithm, our algorithm 
reduced the error rate further by three fold, improved the 
speed by six times, and lowered error floors dramatically in the 
decoding. 

I. Introduction 

Decoding plays a very important role in modern data 
communications. The best known decoding algorithm for 
Turbo codes and LDPC codes is called the sum-product 
algorithm [2]. It is a message passing algorithm operating in 
a general graph. To the surprise of many mathematicians, we 
have little theoretical understanding of its principles despite of 
its effectiveness. Although it has demonstrated satisfying per- 
formances in decoding and solving many other optimization 
problems, it may also give poor results or fail to converge. 

This paper presents the application of cooperative optimiza- 
tion in decoding LDPC codes. It is a new optimization princi- 
ple completely unknown to the mathematics and engineering 
societies before. Similar to the sum-product algorithm, the 
cooperative algorithm also employs message passing operated 
in a general graph. Unlike the sum-product algorithm, its 
computational properties are better understood. While many 
classic methods struggled with local minima, our method 
always has a unique equilibrium and converges to it with 
an exponential rate regardless of initial conditions. It can 
determine if the equilibrium is a global optimum or not. In 
many important cases, it guarantees to find the global optima 
for difficult optimization problems when conventional methods 
often fail. 

Theories for optimization have been studied for centuries. 
They caught special attention after the invention of comput- 
ers because of their importance in solving many practical 
problems with the use of computers. Yet in the past many 
effective optimization methods are found not by applying the 
known optimization theories. Instead they are empirical ones 
discovered with some threads of chances, just like the sum- 
product algorithm for decoding Turbo codes and LDPC codes 
used in data communications. This crucial realization demands 



us to discover new principles for optimization and build new 
theories for them. We can always expect better results through 
deeper theoretical understanding beyond discovering empirical 
rules. Hopefully, the application of a new optimization prin- 
ciple for decoding LDPC codes presented in this paper could 
support this point of view. 

II. The Cooperative Optimization Principle 

A. Basic Ideas 

To solve a hard problem, we follow the divide-and-conquer 
principle. We first break up the problem into a number of sub- 
problems of manageable sizes and complexities. Following 
that, we assign each sub-problem to an agent, and ask those 
agents to solve the sub-problems in a cooperative way. The 
cooperation is achieved by asking each agent to compromise 
its solution with the solutions of others instead of solving 
the sub-problems independently. We can make an analogy 
with team playing, where the team members work together 
to achieve the best for the team, but not necessarily the best 
for each member. In many cases, cooperation of this kind can 
dramatically improve the problem-solving capabilities of the 
agents as a team, even when each agent may have very limited 
power. 

To be more specific, the cooperation is achieved in such a 
multi-agent system via two vital steps executed by each agent 
in an iterative way, 1) solving its sub-problem by soft decision 
making, and 2) passing its soft decisions to its neighboring 
agents. At the very beginning, each agent makes soft decisions 
by solving its own sub-problem and ranking the solutions 
in order of preferences measured by some values. For an 
agent, the most preferable one is the best solution to its sub- 
problem and the less preferable ones are the solutions sub- 
optimal to its sub-problem. Following that, each agent passes 
its soft decisions as messages to its neighboring agents. After 
receiving its neighbor agents' soft decisions, each agent goes 
back to the soft decision making step again. At this time, 
instead of solving its sub-problem independently, it tries to 
solve its sub-problem by compromising its solutions with its 
neighboring agents'. The best solution for one agent may not 
be the best one for another. If there is any conflict among the 
agents, it is required for each agent to compromise its solutions 
with its neighbors to reach a consensus. If a consensus in 
picking solutions is reached through compromising, the system 
reports it as a solution for the original problem. Otherwise, the 
system iterates until a consensus is reached among the agents 
or the iteration exceeds some cap. 



The very core of cooperation is the soft decision making 
via solution compromising. Paper [1] formally describes the 
cooperative optimization in the language of game theory. It has 
been shown in [1] that there are different cooperation schemes 
yielding different computational behaviors of the system. One 
of them leads the system to find the Nash equilibria. Another 
ensures the system of a unique equilibrium. With this scheme, 
the system always converges to the equilibrium with an 
exponential rate regardless of initial conditions. Theory also 
tells us that the equilibrium must be the global optimum if it 
is a consensus solution. Details about these together with the 
theoretical investigation of the cooperative optimization are 
provided in [1]. 

B. A Simple Example 

Let the cost function (also referred to as energy function or 
objective function) to be minimized be E(x\,X2,x 3 ), which 
can be expressed as an aggregation 

E( Xl ,X 2 ,X 3 ) = fl2(xi,X 2 ) + f23(x2,X 3 ) + fx 3 (xi,X 3 ) (1) 

of three binary sub-functions, /i2(xi, X2), f23{x 2 ,x 3 ), and 
fi3(xi,x 3 ). 

To illustrate the decomposition of this problem into simple 
sub-problems, we map the cost function into a graph 
(shown in the upper portion of Fig. [3- We can view each 
variable as a node in the graph and each binary sub-function as 
a connection between two nodes. This graph has one loop and 
we can decompose it into three sub-graphs of no loop shown 
in the lower portion of Fig. [2 one for each variable (double 
circled). Each sub-graph is associated with one cost function, 
Ei,i = 1, 2, 3. For example, the sub-graph for variable x\ has 
its cost function E% as 

E 1 (x 1 ,x 2 ,x 3 ) = (fi 2 (xi,x 2 ) + fi 3 {xi,x 2 ))/2 . 

So are the cost functions of the sub-graphs for other two 
variables: 

E 2 (x 1 ,x 2 ,x 3 ) = (f 23 (x 2 ,x 3 ) + f 12 (x 1 ,x 2 ))/2 , 
E 3 (X!,X 2 ,X 3 ) = (fi 3 {x!,x 3 ) + f 23 (x 2 ,x 3 ))/2 . 
Obviously, 

E = E% + E 2 + E 3 . 

With such a decomposition, the original problem, minE, 
becomes three sub-problems, mm Ei, i = 1,2,3. 

For the ith sub-problem, the preferences for picking values 
for variable Xi are used as the soft decisions for solving the 
sub-problem. Those preferences are measured by some real 
values and are described as a function of Xi, denoted as Ci{xi). 
It is also called the assignment constraint for variable Xi. 
The different function values, Ci(xi), stand for the different 
preferences in picking values for variable Xi. Because we 
are dealing with minimizing E, for the convenience of the 
mathematical manipulation, we choose to use smaller function 
values, Ci(xi)s, for more preferable variable values. 




Fig. 1 . The illustration of decomposing a graph with loop(s) into sub-graphs 
of tree-like structures. 

To introduce cooperation in solving the sub-problems, we 
iteratively update the assignment constraints (soft decisions in 
assigning variables) as 

c[ k \xi) = mm (1 - A fe ) E x + \ k wncf"''^,) (2) 

j 

4 ( x i) = min C 1 ~ ^k)E 2 + X k w^cf^^ixj) (3) 

j 

cf\ Xi ) = mm (1 - A fe ) E 3 + X k V w 3j cf " 1} ( Xj ) (4) 

j 

where k is the iteration step, Wij are non-negative weight 
values satisfying X)i w y = 1- ^ nas been found [1] that 
such a choice of makes sure the iterative update functions 
converge. 

Parameter Afe controls the level of the cooperation at step k 
and is called the cooperation strength, satisfying < Afe < 
1. A higher value for Afe will weigh the solutions of the 
other sub-problems Cj(xj) more than the one of the current 
sub-problem Ei. In other words, the solution of each sub- 
problem will compromise more with the solutions of other 
sub-problems. As a consequence, a higher level of cooperation 
in the optimization is reached in this case. 

The update functions, 10, (0, and @, are a set of differ- 
ence equations of the assignment constraints Ci(xi). Unlike 
conventional difference equations used by probabilistic relax- 
ation algorithms [3], and Hopfield Networks [4], this set of 
difference equations always has one and only one equilibrium 
given A and Wij . Some important properties of this cooperative 
optimization will be shown in the following subsections. 

C. Cooperative Optimization in a General Form 

Let E(xi, x 2 , . . . , x n ) be a multivariate cost function, or 
simply denoted as E(x), where each variable xi has a finite 
domain Di of size m.; (m.; = \Di\). We break the function 
into n sub-cost functions Ei (i = 1,2, ...,ri), one for 
each variable, such that Ei contains at least variable Xi, the 
minimization of each cost function Ei (the sub-problem) is 



computational manageable in complexity, and 

n 

E{x) = Y,Ei{x). 



(5) 



i=l 



The cooperative optimization is defined by the following set 
of difference equations: 



C i\ X i) = min ( (1 - Afe) + Afe > //';,(- 

Xj €zXi\xi 
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(6) 

Intuitively, we might choose Wij such that it is non-zero if 
Xj is contained by Ef. However, theory tells us that this is too 
restrictive. To make the algorithm work, we need to choose 
(wij) nKn to be a propagation matrix defined as follows: 



Definition 2.1: A propagation matrix W = (w 



13 Jnxn 



is a 



irreducible, nonnegative, real-valued square matrix and satis- 
fies 



Wij = 1, for 1 < j < n 



A matrix W is called reducible if there exists a permutation 
matrix P such that PWP T has the block form 

A B 
O C 

Definition 2.2: The system is called reaching a consensus 
solution if, for any i and j where Ej contains x,i, 

arg min Ei — arg min Ej , 



where Ei is defined as 

Ei = (1 - A fc ) Ei + X k J2 



WijCj 



^\ X j) 



Definition 2.3: A solution to the difference equations l|6} 
is called an equilibrium of the system. Specifically, it is a 
set of values for all the assignment constraints (the soft deci- 
sions), (c±(xi), 02(^2), ■ • ■ , c n (x n )), such that the difference 
equations are satisfied. 

To simplify the notations in the following discussions, let 



,(*) 



Let x^' = argmin 2 , i c\ K> {xi), the favorable value for assign- 
ing variable a^. Let x^ = (x^,x 2 , ). It is the 
candidate solution obtained by the cooperative algorithm at 
iteration k. 

D. Some Important Properties 

The theoretical understanding of the cooperative optimiza- 
tion has been given in detail in [1]. Here we list some 
important properties. 

The following theorem shows that c[ k \xi) for Xi £ Di have 
a direct relationship to the lower bound on the cost function 
E(x). 

Theorem 2.1: Given any propagation matrix W and the 
general initial condition cS ^ 1 = or Ai = 0, 2~2i c i, k \ x i) 



(fc) Ik) 



) 2 1 • 



is a lower bound function on E(xi,. 



E^\x u . 



■ i Xn 



denoted as 



That is 



(xi) < E(xi,x 2 , ■ ■ ■ ,x n ), for any k > 1 . (7) 



In particular, let E*_} k ^ = 2~2 c T'(%i)> men E^} K> is a lower 

bound on the optimal cost E*, that is E*}^ < E* . 
*(k) 

Here, subscript "-" in E_ indicates that it is a lower bound 
on E*. 

This theorem tells us that 2~2 c i O^i) provides a lower 
bound on the cost function E. We will show in the next 
theorem that this lower bound is guaranteed to be improved 
as the iteration proceeds. 

Theorem 2.2: Given any propagation matrix W, a constant 
cooperation strength A, and the general condition = 0, 
{E*_ {k) \k > 0} is a non-decreasing sequence with upper bound 
E*. 

If a consensus solution is found at some step or steps, then 
we can find out the closeness between the consensus solution 
and the global optimum in cost. If the algorithm converges to 
a consensus solution, then it must be the global optimum also. 
The following theorem makes these points clearer. 

Theorem 2.3: Given any propagation matrix W, and the 
general initial condition = or Ai = 0. If a consensus 
solution x is found at iteration step k\ and remains the same 
from step k\ to step k 2 , then the closeness between the cost 
of x, E(x), and the optimal cost, E*, satisfies the following 
two inequalities, 



*( fc ) 



< E{x) - E* < I f[ A fe j (E(x) - E_ 



\k—ki 



< E(x) - E* < 



n 



k—k± 



-(E* - E_ 



(8) 



(9) 



where (E* - E_ 



*{ki-l)s 



is the difference between the optimal 



cost E* and the lower bound on the optimal cost, E_ , 
obtained at step fci— 1. When k^ — ki — > 00 and 1 — Afc > e > 
for ki < k < k 2 , E(x) -> E*. 

The performance of the cooperative algorithm further de- 
pends on the dynamic behavior of the difference equations (|6j. 
Its convergence property is revealed in the following two 
theorems. The first one shows that, given any propagation 
matrix and a constant cooperation strength, there does exist a 
solution to satisfy the difference equations 0. The second part 
shows that the cooperative algorithm converges exponentially 
to that solution. 

Theorem 2.4: Given any symmetric propagation matrix 
W and a constant cooperation strength A, then Difference 
Equations have one and only one solution, denoted as 
(cf°\xi)) or simply c(°°). 

Theorem 2.5: Given any symmetric propagation matrix W 
and a constant cooperation strength A, the cooperative algo- 
rithm, with any choice of the initial condition cS°\ converges 



to c 1 - 00 * 1 with an exponential convergence rate A. That is 



lie 



<*>-c<~> lle , 



lloo < A fc ||c 



(0) _ c (oo)| 



(10) 



This theorem is called the convergence theorem. It indicates 
that our cooperative algorithm is stable and has a unique attrac- 
tor, c(°°\ Hence, the evolution of our cooperative algorithm 
is robust, insensitive to perturbations, and the final solution of 
the algorithm is independent of initial conditions. In contrast, 
conventional algorithms based on iterative improvement (e.g. 
gradient descent) have many local attractors due to the local 
minima problem. The evolution of these algorithms are sensi- 
tive to perturbations, and the final solutions of these algorithms 
are dependent on initial conditions. 

III. Decoding LDPC via Cooperative Optimization 

A. LDPC codes 

LDPC codes belong to a special class of linear block codes 
whose parity check matrix H has a low density of ones. LDPC 
codes were originally introduced by Gallager in his thesis [5]. 
After the discovery of turbo codes in 1993 by Berrou et al. [6], 
LDPC codes were rediscovered by Mackay and Neal [7] in 
1995. Both classes have excellent performances in terms of 
error correction close to the Shannon limit. 

The parity check matrix H is a binary matrix with elements 
in {0, 1}. It is sparse with a few non-zero elements. Let the 
code word length be n and the input data be 

X (xi,X2,..., X n ) , 

then H is a n x k matrix, where k is the number of rows. 
Each row of H, denoted as Hj, introduces one parity check 
constraint on x, 

HjX T = mod 2 . 
Since H has k rows, there are k constraints on x. That is, 

Hx T = mod 2 . 

B. Maximum-Likelihood Decoding 

To minimize the probability of decoding error, the optimal 
decoder for a channel code finds an input x that has the 
maximum posterior probability P{x\y} given an output y. 
Usually, we assume a uniform prior distribution on x. In this 
case, the maximum posterior criteria reduces to the maximum 
likelihood, i.e., finding an input x which makes the likelihood 
distribution P{y\x} a maximum. 

For a discrete memoryless additive Guassian channel and a 
binary modulation, the output data bit at position i, y^, can be 
modeled as the following random variable: 

y, = (2Xi - 1) + & , 

where Xi G {0, 1} and is a additive noise of the Gaussian 
distribution with variance a 2 . 
Let 

P{Vi/xi = 1} 



Log 



where P{yi/xi}, Xj = 0,1, is the conditional distribution of 
output data bit yi given the input data bit Xi. In this case, the 
maximum likelihood decoding becomes 



max > ai(2xi — 1), s.t. Hx = mod 2 



(ID 



This is a constrained maximization problem. Without loss 
of generality, we can transform it into an unconstrained 
minimization problem in a more general form. To do that, 
we introduce unary constraints on variables xi, 



fiiXi) 



-2at 
2a; 



if Xi = 1 
if Xi — 



and convert each parity check constraint to a m-ary constraint 
on the variables of the constraint. Let the jth parity check 
constraint, HjX T = mod 2, define on a subset of variables, 
denoted as Xj of size \Xj\ = rrij. 
on Xa is defined as 



Then the m-ary constraint 



fx i (X i ) = 





oo 



if H 3 x T 
otherwise 



mod 2 



Using those definitions, il It becomes 



min }Ji(xi) + 



E 



(12) 



PiVi/xi = 0} 



In general, the above problem is called the constraint-based 
optimization, which is NP-hard. It is a core problem in 
mathematical logic and computing theory. In practice, it is 
fundamental in solving many problems in machine vision, 
image processing, computational chemistry, integrated circuit 
design, computer network design, artificial intelligence and 
more. 

C. Decomposing into Sub-problems 

The Tanner graph is used to help us understand the decom- 
position. A Tanner graph for a LDPC code is a bipartite graph 
with variable nodes on one side and constraint nodes (parity 
check nodes) on the other side. Edges in the graph connect 
constraint nodes to variable nodes. A constraint node connects 
to those variable nodes that participate in its parity check. A 
variable node connects to those constraint nodes that use the 
variable in the parity checks. 

The Tanner graph can be decomposed into n tree-like sub- 
graphs, one for each variable. Those sub-graphs can have 
overlaps. Because their tree-like structures, we can find the 
exact solutions for the sub-problems associated with those sub- 
graphs. There are many ways of decomposing a graph which 
lead to different performances of the cooperative algorithm. 
A simple, straightforward way of decomposition is to have 
the sub-graph of each variable node consisting of all the 
constraint nodes linked to the variable node, together with their 
connections, all the variable nodes linked to those constraint 
nodes, together with their connections, and the variable node 
itself. 
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Fig. 2. Decoding an LDPC codes using SPA (the sum-product algorithm) 
and CA (the cooperative algorithm). 



IV. Experimental Results 

We developed high performance US patent pending methods 
and apparatus for decoding Turbo codes and LDPC codes 
using the cooperative algorithm. Usually, short LDPC codes 
have high commercial values because the decoding time is 
also short. It also has been found that irregular LDPC codes 
have better performances than regular ones [8]. 

A candidate code for China HDTV (proposed by the author 
using a new way of code construction called quantum coding) 
is a (7493,4572)-irregular LDPC code of a data rate 0.61. 
China decides to use LDPC codes instead of Turbo codes 
for channel coding because of their higher coding gains and 
lower complexity in decoding. Fig |3 shows the performances 
of the cooperative algorithm and the sum-product algorithm 
in decoding the LDPC code using 10, 000 code words and 
AWGN (additive white Gaussian Noise) channel. 

The maximum number of iterations for the sum-product 
algorithm is 30. It was found that there is not much improve- 
ment in the decoding quality after 30 iterations. An error floor 
was observed at BERs below 10~ 4 using the sum-product 
algorithm. The acceptable error rate is below 10~ 9 for China 
HDTV. The sum-product algorithm can not achieve that even 
after the Eb/No is higher than 2.0 dB 

The error floor trouble was completely removed by the 
cooperative algorithm. The error rate drops to zero after the 
Eb/No is higher than 1.7 dB. At the "water fall" region, 
the cooperative algorithm has reduced the decoding error rates 
further by more than three fold. For the cooperative algorithm, 
the maximum iteration number is 120 mainly because of much 
less complexity of its computation. Even with that number, 
it was still more than six times faster than the sum-product 
algorithm. 

In the second example, we use a regular LDPC code to 
demonstrate that the cooperative algorithm has much less 
dependence on the code structure than the sum-product al- 
gorithm. The code is the Turbo-like production of a simple 
parity check code (8, 7) 

D x D 2 D 3 D 4 D 5 D 6 D 7 P 8 



The configuration of the code is a 5 dimensional cubic (8, 7) n . 
The block size is 32768, the data size is 16807, and the 
data rate is 0.513. LPDC codes of this kind are simplest in 
structure and the most easy to encode (but not necessarily the 
best code distances). The sum-product algorithm has terrible 
performance in decoding this kind of codes due to the high 
regularity of the code structure. Fig|2]shows the performances 
of both algorithms using 100 code words and the AWGN 
channel. The cooperative algorithm was much better than 
the sum-product algorithm in this case. The success of the 
cooperative algorithm in decoding this type of LPDC codes 
implies that we can have greater flexibility at constructing 
high performance codes without worrying too much about the 
limitations of decoding algorithms. 

V. Conclusion 

We have presented the application of a new optimization 
technique called cooperative optimization for decoding LDPC 
codes. Like the well-known sum-product algorithm, the coop- 
erative algorithm is also based on iterative message passing 
operating in the Tanner graph. Although similar in operations, 
they are derived from different principles. 

The sum-product algorithm is a generalization of the be- 
lief propagation algorithm [9] used in AI. It can find exact 
solutions when the graph it operates on has no cycles. With 
cycles, we still lack a theoretical understanding of the behavior 
of the algorithm. Unlike many conventional methods, cooper- 
ative optimization has a solid theoretical foundation on many 
computational properties. In our experiments, it significantly 
outperformed the sum-product algorithm both in efficiency 
and accuracy for decoding different LDPC codes. The new 
cooperative decoding algorithm can be extended further from 
the min-sum semiring to other semirings similar to those done 
for the sum-product algorithm [2] and general distributive 
law [10]. 
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